Add JSON schema for test runner arguments#169
Conversation
Add strong language to always use the `run_tests` tool. Add a JSON schema for the CLI arguments to the test runner. The schema was dumped using dart-lang/core#897 Add a test for the conversion of boolean, string, and list of string arguments, as well as overriding the reporter argument.
PR HealthChangelog Entry ✔️
Changes to files need to be accounted for in their respective changelogs. |
|
I haven't gotten an eval framework set up, but my manual testing was pretty positive with this. The agent was able to use the test runner more effectively, and I was unsuccessful in attempts to coach usage in other ways like a |
| } | ||
|
|
||
| const _dartTestCliSchema = ''' | ||
| {"type":"object","properties":{"help":{"type":"boolean","description":"Show this usage information.\\ndefaults to \\\"false\\\""},"version":{"type":"boolean","description":"Show the package:test version.\\ndefaults to \\\"false\\\""},"name":{"type":"array","description":"A substring of the name of the test to run.\\nRegular expression syntax is supported.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"plain-name":{"type":"array","description":"A plain-text substring of the name of the test to run.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"tags":{"type":"array","description":"Run only tests with all of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"exclude-tags":{"type":"array","description":"Don't run tests with any of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"run-skipped":{"type":"boolean","description":"Run skipped tests instead of skipping them.\\ndefaults to \\\"false\\\""},"platform":{"type":"array","description":"The platform(s) on which to run the tests.\\n[vm (default), chrome, firefox, edge, node].\\nEach platform supports the following compilers:\\n[vm]: kernel (default), source, exe\\n[chrome]: dart2js (default), dart2wasm\\n[firefox]: dart2js (default), dart2wasm\\n[edge]: dart2js (default)\\n[node]: dart2js (default), dart2wasm\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"compiler":{"type":"array","description":"The compiler(s) to use to run tests, supported compilers are [dart2js, dart2wasm, exe, kernel, source].\\nEach platform has a default compiler but may support other compilers.\\nYou can target a compiler to a specific platform using arguments of the following form [<platform-selector>:]<compiler>.\\nIf a platform is specified but no given compiler is supported for that platform, then it will use its default compiler.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"preset":{"type":"array","description":"The configuration preset(s) to use.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"concurrency":{"type":"string","description":"The number of concurrent test suites run.\\ndefaults to \\\"8\\\""},"total-shards":{"type":"string","description":"The total number of invocations of the test runner being run."},"shard-index":{"type":"string","description":"The index of this test runner invocation (of --total-shards)."},"timeout":{"type":"string","description":"The default test timeout. For example: 15s, 2x, none\\ndefaults to \\\"30s\\\""},"ignore-timeouts":{"type":"boolean","description":"Ignore all timeouts (useful if debugging)\\ndefaults to \\\"false\\\""},"pause-after-load":{"type":"boolean","description":"Pause for debugging before any tests execute.\\nImplies --concurrency=1, --debug, and --ignore-timeouts.\\nCurrently only supported for browser tests.\\ndefaults to \\\"false\\\""},"debug":{"type":"boolean","description":"Run the VM and Chrome tests in debug mode.\\ndefaults to \\\"false\\\""},"coverage":{"type":"string","description":"Gather coverage and output it to the specified directory.\\nImplies --debug."},"chain-stack-traces":{"type":"boolean","description":"Use chained stack traces to provide greater exception details\\nespecially for asynchronous code. It may be useful to disable\\nto provide improved test performance but at the cost of\\ndebuggability.\\ndefaults to \\\"false\\\""},"no-retry":{"type":"boolean","description":"Don't rerun tests that have retry set.\\ndefaults to \\\"false\\\""},"test-randomize-ordering-seed":{"type":"string","description":"Use the specified seed to randomize the execution order of test cases.\\nMust be a 32bit unsigned integer or \\\"random\\\".\\nIf \\\"random\\\", pick a random seed to use.\\nIf not passed, do not randomize test case execution order."},"fail-fast":{"type":"boolean","description":"Stop running tests after the first failure.\\n\\ndefaults to \\\"false\\\""},"reporter":{"type":"string","description":"Set how to print test results.\\ndefaults to \\\"compact\\\"\\nallowed values: compact, expanded, failures-only, github, json, silent"},"file-reporter":{"type":"string","description":"Enable an additional reporter writing test results to a file.\\nShould be in the form <reporter>:<filepath>, Example: \\\"json:reports/tests.json\\\""},"verbose-trace":{"type":"boolean","description":"Emit stack traces with core library frames.\\ndefaults to \\\"false\\\""},"js-trace":{"type":"boolean","description":"Emit raw JavaScript stack traces for browser tests.\\ndefaults to \\\"false\\\""},"color":{"type":"boolean","description":"Use terminal colors.\\n(auto-detected by default)\\ndefaults to \\\"false\\\""}},"required":[]} |
There was a problem hiding this comment.
Why is this a String, how are we intended to update it?
There was a problem hiding this comment.
I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?
For instance color, help, file-reporter, pause-after-load, etc?
There was a problem hiding this comment.
Why is this a String, how are we intended to update it?
We'd dump it from the test runner binary again and replace the entire String. We can change this away from a String, it will add steps to update, but I don't think that's a problem. Would you prefer a map literal?
I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?
For instance
color,help,file-reporter,pause-after-load, etc?
I don't think it makes sense to curate the whole list manually, but defining a blocklist is feasible. I'd worry about forgetting to add an argument sometime and leaving a gap where the client needs to use the shell even though we said not to. WDYT about a blocklist defined here? If a blocklist defined by the test runner is better we'd need to update the package:args API.
There was a problem hiding this comment.
I think file-reporter has plausible utility for a client, I added a blocklist with the others.
For the overall token usage I'm partially running on a hunch that some sort of tool filtering is likely to become a standard feature for clients. There is a huge number of tools being created and a number of discussions started about managing the volume of tools. I started exploring tool filtering by making a call with less historical context to decide which tools should offered but I didn't figure out how to evaluate the behavior impacts.
There was a problem hiding this comment.
Blocklist sounds fine to me
There was a problem hiding this comment.
I added instructions for updating this. It currently relies on unmerged PRs in package:args and package:test. I'd be fine seeing these land if someone wants to tackle it before I'm back, otherwise I think it's OK if these sit as drafts just in case they are needed and I'll revisit whether it's worth landing in a few weeks.
dart-lang/test#2508
Some CLI arguments don't make sense through this proxy
Revisions updated by `dart tools/rev_sdk_deps.dart`. ai (https://github.com/dart-lang/ai/compare/f2b48c6..12ac0a4): 12ac0a4 2025-06-18 Jacob MacDonald make the log file test less flaky by retrying deletes and reads (dart-lang/ai#180) dbedc6d 2025-06-18 Jacob MacDonald fix cruft in description (dart-lang/ai#179) 3ab9482 2025-06-18 Jacob MacDonald improve tool description for the dtd connection tool and improve error messages (dart-lang/ai#178) 4ca0ff1 2025-06-18 Jacob MacDonald Use shared Implementation type, add clientInfo field to MCPServer (dart-lang/ai#175) 885a4c5 2025-06-18 Jacob MacDonald Add `--log-file` argument to log all protocol traffic to a file (dart-lang/ai#176) 7ca3eba 2025-06-17 Nate Bosch Add JSON schema for test runner arguments (dart-lang/ai#169) core (https://github.com/dart-lang/core/compare/dc97530..b59ecf4): b59ecf4c 2025-06-18 Lasse R.H. Nielsen Optimize surrogate decoding. (dart-lang/core#894) dartdoc (https://github.com/dart-lang/dartdoc/compare/4ceea6b..f1fe177): f1fe1775 2025-06-16 Sarah Zakarias Refactor 404 error page to use div instead of p for search form (dart-lang/dartdoc#4064) ecosystem (https://github.com/dart-lang/ecosystem/compare/64aac3a..d5233c6): d5233c6 2025-06-13 dependabot[bot] Bump the github-actions group with 5 updates (dart-lang/ecosystem#351) web (https://github.com/dart-lang/web/compare/c8c1c28..4b2f02e): 4b2f02e 2025-06-18 nikeokoronkwo Add Variable Declaration Support (dart-lang/web#382) webdev (https://github.com/dart-lang/webdev/compare/661dafd..6dc3dde): 6dc3ddef 2025-06-20 Jessy Yameogo Fix duplicate connection/logs in Webdev (dart-lang/webdev#2635) 0c8a17b4 2025-06-20 Morgan :) Remove dependency overrides. (dart-lang/webdev#2634) a3218638 2025-06-16 Jessy Yameogo modifying DWDS Injector to always inject client and introduce useDwdsWebSocketConnection flag (dart-lang/webdev#2629) 2eb27546 2025-06-16 Morgan :) Prepare for `build_runner` changes. (dart-lang/webdev#2633) Change-Id: Ib323bea37dd77ed94387e77d9c504f889bfa8050 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/436021 Auto-Submit: Devon Carew <devoncarew@google.com> Reviewed-by: Konstantin Shcheglov <scheglov@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
Add strong language to always use the
run_teststool.Add a JSON schema for the CLI arguments to the test runner. The schema
was dumped using dart-lang/core#897
Add a test for the conversion of boolean, string, and list of string
arguments, as well as overriding the reporter argument.